An Executable Sequential Specification for Spark Aggregation
نویسندگان
چکیده
Spark is a new promising platform for scalable data-parallel computation. It provides several high-level application programming interfaces (APIs) to perform parallel data aggregation. Since execution of parallel aggregation in Spark is inherently non-deterministic, a natural requirement for Spark programs is to give the same result for any execution on the same data set. We present PURESPARK, an executable formal Haskell specification for Spark aggregate combinators. Our specification allows us to deduce the precise condition for deterministic outcomes from Spark aggregation. We report case studies analyzing deterministic outcomes and correctness of Spark programs.
منابع مشابه
Automatically Leveraging MapReduce Frameworks for Data-Intensive Applications
MapReduce is a popular programming paradigm for running largescale data-intensive computation. Recently, many frameworks that implement that paradigm have been developed. To leverage such frameworks, however, developers need to familiarize with each framework’s API and rewrite their code. We present Casper, a new tool that automatically translates sequential Java programs to the MapReduce parad...
متن کاملFormal Foundations for the Generation of Heterogeneous Executable Specifications in SystemC from UML/MARTE Models
Embedded system heterogeneity leads to the need to understand the system as an aggregation of components in which different behavioural semantics should cohabit. Heterogeneity has two dimensions. On the one hand, during the design process, different execution semantics, specifically in terms of time (untimed, synchronous, timed) can be required in order to provide specific behaviour characteris...
متن کاملVerifying Interlevel Relations within Multi-Agent Systems: formal theoretical basis
In the general case, at any aggregation level a behavioral specification for a multi-agent system component consists of dynamic properties expressed by complex temporal relations in TTL, which therefore does not allow direct application of automatic verification procedures, more specifically, model checking techniques, used in this paper. In order to apply model checking techniques it is needed...
متن کاملAutomatic Generation of CSP || B Skeletons from xUML Models
CSP ‖ B is a formal approach to specification that combines CSP and B. In this paper we present our tool that automatically translates a subset of executable UML (xUML) models into CSP ‖ B, for the purpose of verification and increased validation at the early stages of a software engineering development lifecycle. The tool is being developed for our industrial collaborators, AWE plc, in order t...
متن کاملExecutable UML and SPARK Ada: The Best of Both Worlds
Executable UML is a well defined UML subset supported by an Action Language that enables the construction of executable models from which reliable target code can be automatically generated. SPARK Ada is a safe Ada subset with formal annotations that renders programs amenable to static analysis and formal verification. This paper describes a hybrid approach where formally annotated Executable U...
متن کامل